Data Management Plan
   HOME

TheInfoList



OR:

A data management plan or DMP is a formal document that outlines how
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
are to be handled both during a research project, and after the
project A project is any undertaking, carried out individually or collaboratively and possibly involving research or design, that is carefully planned to achieve a particular goal. An alternative view sees a project managerially as a sequence of even ...
is completed. The goal of a data management plan is to consider the many aspects of
data management Data management comprises all disciplines related to handling data as a valuable resource. Concept The concept of data management arose in the 1980s as technology moved from sequential processing (first punched cards, then magnetic tape) to r ...
,
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
generation, data preservation, and analysis before the project begins; this may lead to data being well-managed in the present, and prepared for preservation in the future. DMPs were originally used in 1966 to manage aeronautical and engineering projects'
data collection Data collection or data gathering is the process of gathering and measuring information on targeted variables in an established system, which then enables one to answer relevant questions and evaluate outcomes. Data collection is a research com ...
and analysis, and expanded across engineering and scientific disciplines in the 1970s and 1980s. Up until the early 2000s, DMPs were used "for projects of great technical complexity, and for limited mid-study data collection and processing purposes". In the 2000s and later, E-research and economic policies drove the development and uptake of DMPs.


Importance

Preparing a data management plan before data are collected is claimed to ensure that data are in the correct format, organized well, and better annotated. This could arguably save time in the long term because there is no need to re-organize, re-format, or try to remember details about data. It is also claimed to increase research efficiency since both the data collector and other researchers might be able to understand and use well-annotated data in the future. One component of a data management plan is data archiving and preservation. By deciding on an archive ahead of time, the data collector can format data during collection to make its future submission to a database easier. If data are preserved, they are more relevant since they can be re-used by other researchers. It also allows the data collector to direct requests for data to the database, rather than address requests individually. A frequent argument in favor of preservation is that data that are preserved have the potential to lead to new, unanticipated discoveries, and they prevent duplication of scientific studies that have already been conducted. Data archiving also provides insurance against loss by the data collector. In the 2010s, funding agencies increasingly required data management plans as part of the proposal and evaluation process, despite little or no evidence of their efficacy.


Major components

"There is no general and definitive list of topics that should be covered in a DMP for a research project", and researchers are often left to their own devices as to how to fill out a DMP.


Information about data & data format

* A description of data to be produced by the project. This might include (but is not limited to) data that are: ** Experimental ** Observational ** Raw or derived ** Physical collections ** Models ** Simulations ** Curriculum materials ** Software ** Images * How will the data be acquired? When and where will they be acquired? * After collection, how will the data be processed? Include information about ** Software used ** Algorithms ** Scientific workflows * File formats that will be used, justify those formats, and describe the naming conventions used. * Quality assurance & quality control measures that will be taken during sample collection, analysis, and processing. * If existing data are used, what are their origins? How will the data collected be combined with existing data? What is the relationship between the data collected and existing data? * How will the data be managed in the short-term? Consider the following: **
Version control In software engineering, version control (also known as revision control, source control, or source code management) is a class of systems responsible for managing changes to computer programs, documents, large web sites, or other collections o ...
for files ** Backing up data and data products ** Security & protection of data and data products ** Who will be responsible for management


Metadata content and format

Metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
are the contextual details, including any information important for using data. This may include descriptions of temporal and spatial details, instruments, parameters, units, files, etc. Metadata is commonly referred to as “data about data”. Issues to be considered include: * How detailed has the metadata to be in order to make the data meaningful? * How will the metadata be created and/or captured? Examples include lab notebooks, GPS hand-held units, Auto-saved files on instruments, etc. * What format will be used for the metadata? What are the
metadata standards A metadata standard is a requirement which is intended to establish a common understanding of the meaning or semantics of the data, to ensure correct and proper use and interpretation of the data by its owners and users. To achieve this common unde ...
commonly used in the respective scientific discipline? There should be justification for the format chosen.


Policies for access, sharing, and re-use

* Describe any obligations that exist for sharing data collected. These may include obligations from funding agencies, institutions, other professional organizations, and legal requirements. * Include information about how data will be shared, including when the data will be accessible, how long the data will be available, how access can be gained, and any rights that the data collector reserves for using data. * Address any ethical or privacy issues with data sharing * Address
intellectual property Intellectual property (IP) is a category of property that includes intangible creations of the human intellect. There are many types of intellectual property, and some countries recognize more than others. The best-known types are patents, cop ...
&
copyright A copyright is a type of intellectual property that gives its owner the exclusive right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time. The creative work may be in a literary, artistic, education ...
issues. Who owns the copyright? What are the institutional, publisher, and/or funding agency policies associated with intellectual property? Are there embargoes for political, commercial, or patent reasons? * Describe the intended future uses/users for the data * Indicate how the data should be cited by others. How will the issue of persistent citation be addressed? For example, if the data will be deposited in a public archive, will the dataset have a
digital object identifier A digital object identifier (DOI) is a persistent identifier or handle used to uniquely identify various objects, standardized by the International Organization for Standardization (ISO). DOIs are an implementation of the Handle System; they a ...
(DOI) assigned to it?


Long-term storage and data management

* Researchers should identify an appropriate archive for the long-term preservation of their data. By identifying the archive early in the project, the data can be formatted, transformed, and documented appropriately to meet the requirements of the archive. Researchers should consult colleagues and professional societies in their discipline to determine the most appropriate database, and include a backup archive in their data management plan in case their first choice goes out of existence. * Early in the project, the primary researcher should identify what data will be preserved in an archive. Usually, preserving the data in its most raw form is desirable, although data derivatives and products can also be preserved. * An individual should be identified as the primary contact person for archived data, and ensure contact information is always kept up-to-date in case there are requests for data or information about data.


Budget

Data management and preservation costs may be considerable, depending on the nature of the project. By anticipating costs ahead of time, researchers ensure that the data will be properly managed and archived. Potential expenses that should be considered are * Personnel time for data preparation, management, documentation, and preservation * Hardware and/or software needed for data management, backing up, security, documentation, and preservation * Costs associated with submitting the data to an archive The data management plan should include how these costs will be paid.


NSF Data Management Plan

All grant proposals submitted to
NSF NSF may stand for: Political organizations *National Socialist Front, a Swedish National Socialist party *NS-Frauenschaft, the women's wing of the former German Nazi party *National Students Federation, a leftist Pakistani students' political gr ...
must include a Data Management Plan that is no more than two pages. This is a supplement (not part of the 15-page proposal) and should describe how the proposal will conform to the Award and Administration Guide policy (see below). It may include the following: # The types of data # The standards to be used for data and metadata format and content # Policies for access and sharing # Policies and provisions for re-use # Plans for archiving data Policy summarized from the
NSF NSF may stand for: Political organizations *National Socialist Front, a Swedish National Socialist party *NS-Frauenschaft, the women's wing of the former German Nazi party *National Students Federation, a leftist Pakistani students' political gr ...
Award and Administration Guide, Section 4 (Dissemination and Sharing of Research Results): # Promptly publish with appropriate authorship # Share data, samples, physical collections, and supporting materials with others, within a reasonable time frame # Share software and inventions # Investigators can keep their legal rights over their intellectual property, but they still have to make their results, data, and collections available to others # Policies will be implemented via ## Proposal review ## Award negotiations and conditions ## Support/incentives


ESRC Data Management Plan

Since 1995, the UK's
Economic and Social Research Council The Economic and Social Research Council (ESRC), formerly the Social Science Research Council (SSRC), is part of UK Research and Innovation (UKRI). UKRI is a non-departmental public body (NDPB) funded by the UK government. ESRC provides fundi ...
(ESRC) have had a research data policy in place. The current ESRC Research Data Policy states that research data created as a result of ESRC-funded research should be openly available to the scientific community to the maximum extent possible, through long-term preservation and high-quality data management. ESRC requires a data management plan for all research award applications where new data are being created. Such plans are designed to promote a structured approach to data management throughout the data lifecycle, resulting in better quality data that is ready to archive for sharing and re-use. The
UK Data Service The UK Data Service is the largest digital repository for quantitative and qualitative social science and humanities research data in the United Kingdom. The organisation is funded by the UK government through the Economic and Social Research ...
, the ESRC's flagship data service, provides practical guidance on research data management planning suitable for social science researchers in the UK and around the world. ESRC has a longstanding arrangement with the
UK Data Archive The UK Data Archive is a national centre of expertise in data archiving in the United Kingdom (UK). It houses the largest collection of social sciences and population digital data in the UK. It is certified under CoreTrustSeal as a trusted ...
, based at the
University of Essex The University of Essex is a public university, public research university in Essex, England. Established by royal charter in 1965, Essex is one of the original plate glass university, plate glass universities. Essex's shield consists of the an ...
, as a place of deposit for research data, with award holders required to offer data resulting from their research grants via the UK Data Service. The Archive enables data re-use by preserving data and making them available to the research and teaching communities.


Benefits

There are three major themes identified in the literature in terms of benefits of DMPs: professional benefits, economic benefits and institutional benefits. It has been argued that DMPs can form a catalyst for researchers to improve their
data literacy Data literacy is the ability to read, understand, create, and communicate data as information. Much like literacy as a general concept, data literacy focuses on the Skill, competencies involved in working with data. It is, however, not similar to t ...
and data management practices, often aided by the library.


In practice

In practice, however, DMPs often fall short of their stated goals. A 2012 review of DMP policies by research funders found that policies were missing several elements from the
Digital Curation Centre The Digital Curation Centre (DCC) was established to help solve the extensive challenges of digital preservation In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value ...
's list of criteria for a DMP. Researchers shared DMP text. DMPs are often regarded as an "administrative exercise rather than an integral part" of the research process, and it has been acknowledged that DMPs do not guarantee good
data management Data management comprises all disciplines related to handling data as a valuable resource. Concept The concept of data management arose in the 1980s as technology moved from sequential processing (first punched cards, then magnetic tape) to r ...
practices. Most funders do not require a DMP after grants are awarded, thus robbing stakeholders of the powerful tool that an active DMP can be. Best practice would be to "require maintenance of the data management plan following award and during the active phase of a study." At present, data sharing plans are more important than data management plans to funders.


See also

*
Data sharing Data sharing is the practice of making data used for scholarly research available to other investigators. Many funding agencies, institutions, and publication venues have policies regarding data sharing because transparency and openness are consid ...
* DMPTool


References


Further reading

{{Cite book, title = Delivering research data management services, last = Pryor, first = Graham, publisher = Facet Publishing, year = 2014, isbn = 9781856049337


External links


Data Stewardship Wizard
Create Smart Data Management Plans for FAIR Open Science
DataONE

DMPonline

Digital Curation Centre

NSF Grant Proposal Guidelines

LTER Blog: How to write a data management plan

UK Data Service
Prepare and Manage Data: Guidance and tools for social science researchers
Plan de Gestión de Datos PaGoDa
DMP Toolkit of The Consortium of Universities of the Region of Madrid and the UNED for Library Cooperation (Madroño - Spain) Data management